We analyzed 39 autoimmune disease- and trait-associated SNP sets, obtained from the Supplemental table 1 of the Farh, K. K.-H., Marson, A., Zhu, J., Kleinewietfeld, M., Housley, W. J., Beik, S., … Bernstein, B. E. (2014). “Genetic and epigenetic fine mapping of causal autoimmune disease variants”“ Nature. doi:10.1038/nature13835.
First, we re-created the heatmap of shared genetic features among the autoimmune diseases and traits, that is, counts of genomic elements overlapping between pairs of terms. We will use this heatmap as a reference point to compare with the heatmaps produced by the regulatory similarity analysis.
Although we used 4,498 regulatory datasets from the ENCODE project processed with the use with GenomeRunner, some regulatory datasets show not statistically significant enrichments in any of the 39 SNP sets. We removed these datasets as non-informative, and kept the remaining 2,969 regulatory datasets.
## [1] 4498 39
## [1] 2969 39
We visualized the matrix of pair-wise Spearman correlation coefficients among the term-specific regulatory enrichment profiles.
We then compared how regulatory similarity correlates with shared genomic features similarity. Spearman correlation coefficient between the two is:
## [1] 0.395182
The top 10 pairs of disease- aassociated SNPs are most similar with each other. The correlation coefficient shows Spearman correlation coefficient among the regulatory enrichment profiles for each term-specific SNP set.
##
## --------------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## ---------------------------------------------- ------------------------- -------------------
## HDL_cholesterol Triglycerides 0.473
##
## LDL_cholesterol Triglycerides 0.4314
##
## Chronic_kidney_disease Urate_levels 0.3742
##
## HDL_cholesterol LDL_cholesterol 0.3475
##
## Bone_mineral_density Type_2_diabetes 0.3225
##
## Multiple_sclerosis Primary_biliary_cirrhosis 0.316
##
## Alzheimers_combined Type_2_diabetes 0.2999
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Urate_levels 0.2976
##
## Fasting_glucose_related_traits Type_2_diabetes 0.2972
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Platelet_counts 0.2944
## --------------------------------------------------------------------------------------------
The regulatory similarity dendrogram can be divided into four separate clusters:
## Cluster01 has 14 members
## Platelet_counts
## Liver_enzyme_levels_gamma_glutamyl_transferase
## Red_blood_cell_traits
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Type_2_diabetes
## Fasting_glucose_related_traits
## Bone_mineral_density
## Alzheimers_combined
## Creatinine_levels
## Renal_function_related_traits_BUN
## Urate_levels
## Chronic_kidney_disease
##
## Cluster02 has 9 members
## Multiple_sclerosis
## Kawasaki_disease
## Celiac_disease
## Systemic_lupus_erythematosus
## Psoriasis
## Ulcerative_colitis
## Rheumatoid_arthritis
## Crohns_disease
## Autoimmune_thyroiditis
##
## Cluster03 has 5 members
## Primary_biliary_cirrhosis
## Ankylosing_spondylitis
## Systemic_sclerosis
## Migraine
## Primary_sclerosing_cholangitis
##
## Cluster04 has 11 members
## Juvenile_idiopathic_arthritis
## Atopic_dermatitis
## Alopecia_areata
## C_reactive_protein
## Allergy
## Type_1_diabetes
## Vitiligo
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## Asthma
##
We estimated the differences in regulatory associations of term-secific SNP sets.
The first column shows names of regulatory datasets. The following two columns show the average p-values of the cluster-specific SNP sets-regulatory associations. The smaller a p-value is, the more SNPs in a cluster enriched in corresponsing regulatory dataset. A “-” sign indicates that an association is underrepresented (depleted). The “adj.P.Val” column shows whether a difference in the associations between the clusters is statistically significantly different. The last column shows descriptions of the regulatory datasets. The tables were sorted by “adj.P.Val” column; the top 10 or less most significantly different associations are shown.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 116"
##
## ----------------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## -------------------------------------------------- ---------- --------- ----------- ----------------------------
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 -0.0696 0.0001995 1.664e-05 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 -0.1765 0.002657 1.664e-05 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 -0.2057 0.00165 1.664e-05 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeOpenChromFaireGm12892Pk -0.2289 0.001429 2.372e-05 GM12892 FAIRE Peaks from
## ENCODE/OpenChrom(UNC)
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 -0.1244 0.0004603 2.849e-05 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 -0.04666 6.729e-05 3.617e-05 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 -0.2152 0.001387 3.617e-05 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 -0.1962 0.002065 3.617e-05 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep1 -0.1173 0.001776 4.267e-05 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeBroadHistoneGm12878H3k9me3StdPk -2.144e-06 7.237e-08 4.636e-05 GM12878 H3K9me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## ----------------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 1"
##
## ---------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## ---------------------------------------- ------- --------- ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk -0.6355 4.879e-06 0.01794 Monocytes CD14+ CTCF Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
## ---------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 76"
##
## ----------------------------------------------------------------------------------------------------------
## Row.names c2 c4 adj.P.Val V2
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.00165 0.8762 0.005057 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 0.0001995 0.8082 0.005057 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeOpenChromFaireGm12892Pk 0.001429 0.8088 0.005057 GM12892 FAIRE Peaks from
## ENCODE/OpenChrom(UNC)
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.002657 0.8098 0.005057 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0004603 0.8089 0.005801 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.007786 -0.9993 0.00689 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 6.729e-05 -0.9847 0.00689 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.001387 0.6498 0.00689 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.002065 0.7675 0.00689 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.007315 0.8856 0.00899 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ----------------------------------------------------------------------------------------------------------
##
## [1] "c3 vs. c4 , number of degs significant at adj.p.val<0.5: 1"
##
## --------------------------------------------------------------------------------------------------
## Row.names c3 c4 adj.P.Val V2
## ---------------------------------------- --------- ------ ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk 4.879e-06 0.7054 0.08407 Monocytes CD14+ CTCF Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
## --------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 116 1 0
##
## **c2** 0 0 0 76
##
## **c3** 0 0 0 1
##
## **c4** 0 0 0 0
## ----------------------------
The cluster 2 was the most different from cluster 1 and cluster 4. Disease- and trait associated SNPs from this cluster were enriched in signal B-cells, such as Gm12878 B-cell leukemia and other cells from Gm family of cell types, CD20+ B Lymphocytes yielded most of the association signal
| Cell_type | Frequency | Factor | Frequency | |
|---|---|---|---|---|
| gm12878 | 75 | dnasei | 45 | |
| cd20+ | 12 | pol2 | 25 | |
| th1 | 12 | h3k4me3 | 23 | |
| gm12892 | 10 | pol2-4h8 | 11 | |
| th2 | 10 | faire | 6 | |
| dnd41 | 9 | h3k4me2 | 5 | |
| gm12865 | 9 | rna-pet | 5 | |
| gm12891 | 9 | h3k4me1 | 5 | |
| treg | 6 | h3k9ac | 4 | |
| gm06990 | 5 | atf2 | 4 | |
| cd4+ | 4 | stat5a | 4 | |
| gm12864 | 4 | h3k27ac | 4 | |
| gm18505 | 3 | mta3 | 4 | |
| hela-s3 | 3 | runx3 | 4 | |
| gm12875 | 2 | h3k9me3 | 3 | |
| gm19193 | 2 | h2az | 3 | |
| cd14+ | 2 | h3k27me3 | 3 | |
| gm18507 | 2 | p300 | 2 | |
| gm15510 | 2 | foxm1 | 2 | |
| th17 | 1 | nfatc1 | 2 | |
| hcm | 1 | chd1 | 2 | |
| nhdf-neo | 1 | bclaf1 | 2 | |
| hepg2 | 1 | tblr1 | 2 | |
| nh-a | 1 | bhlhe40 | 2 | |
| hsmm | 1 | ctcf | 2 | |
| k562 | 1 | cnv | 2 | |
| gm10847 | 1 | pml | 2 | |
| hct-116 | 1 | nfic | 2 | |
| raji | 1 | whip | 2 | |
| nfkb | 2 | |||
| ebf1 | 2 | |||
| h3k79me2 | 1 | |||
| ezh2 | 1 | |||
| mxi1 | 1 | |||
| h4k20me1 | 1 | |||
| junctions | 1 |
We used the data from Hidalgo CA, Blumm N, Barabasi A-L, Christakis NA. PLoS Computational Biology, 5(4):e1000353 doi:10.1371/journal.pcbi.1000353, available at http://barabasilab.neu.edu/projects/hudine/resource/data/data.html. These data provide co-morbidity measurements among pairs of diseases. We map autoimmune disease- and trait names to 3-digits ICD9 codes and evaluate how co-morbidity measurements correlate with regulatory similarity measurements. We used Phi measurement of co-morbidity. The Spearman correlation coefficient of Phi and regulatory similarity is:
## [1] 0.4130129
## [1] "sharedRels correlation with regulatory similarity"
## [1] 0.09761721
## [1] "obsExp correlation with regulatory similarity"
## [1] 0.1953638
## [1] "minMim correlation with regulatory similarity"
## [1] 0.2849605
## [1] "directStr correlation with regulatory similarity"
## [1] 0.2301912
## [1] "relOverlap correlation with regulatory similarity"
## [1] 0.1429562
## [1] "misn correlation with regulatory similarity"
## [1] 0.302173
We also performed regulatory similarity analysis using subsets of regulatory datasets, such as Transcription Factor Binding Sites or Histone Modification Marks. Here, out of all regulatory datasets, we selected only TFBSs.
## [1] 1954 39
## [1] 1259 39
Next, we visualized heatmap of regulatory similarity.
and checked how well it correlates with original shared genetic overlap clustering:
## [1] 0.399781
The top 10 pairs of disease-associated SNPs are most similar with each other.
##
## -----------------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## ---------------------------------------------- ---------------------------- -------------------
## HDL_cholesterol Triglycerides 0.5484
##
## Kawasaki_disease Systemic_lupus_erythematosus 0.5352
##
## Bone_mineral_density Type_2_diabetes 0.5268
##
## Kawasaki_disease Multiple_sclerosis 0.501
##
## Kawasaki_disease Rheumatoid_arthritis 0.4775
##
## Celiac_disease Kawasaki_disease 0.4754
##
## LDL_cholesterol Triglycerides 0.4743
##
## Kawasaki_disease Ulcerative_colitis 0.4661
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Urate_levels 0.4191
##
## Alzheimers_combined Bone_mineral_density 0.4149
## -----------------------------------------------------------------------------------------------
The similarity dendrogram can be divided into separate clusters:
## Cluster01 has 8 members
## Kawasaki_disease
## Systemic_lupus_erythematosus
## Celiac_disease
## Ulcerative_colitis
## Psoriasis
## Multiple_sclerosis
## Rheumatoid_arthritis
## Allergy
##
## Cluster02 has 9 members
## Systemic_sclerosis
## Primary_biliary_cirrhosis
## Atopic_dermatitis
## Juvenile_idiopathic_arthritis
## Ankylosing_spondylitis
## Crohns_disease
## Type_1_diabetes
## Autoimmune_thyroiditis
## Primary_sclerosing_cholangitis
##
## Cluster03 has 10 members
## Urate_levels
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Renal_function_related_traits_BUN
## Platelet_counts
## Red_blood_cell_traits
## C_reactive_protein
## Fasting_glucose_related_traits
##
## Cluster04 has 12 members
## Chronic_kidney_disease
## Alzheimers_combined
## Bone_mineral_density
## Type_2_diabetes
## Vitiligo
## Migraine
## Alopecia_areata
## Asthma
## Creatinine_levels
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
##
The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations between the groups is statistically significantly different.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 54"
##
## ---------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## -------------------------------------------------- --------- ------ ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 0.8227 0.0001666 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 0.9186 0.0002231 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 0.6894 0.0002231 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 0.9673 0.0002231 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 0.6107 0.0002929 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 0.8376 0.0002929 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957 0.942 0.0003906 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 0.4192 0.0004104 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 0.6266 0.0004104 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 0.7082 0.0004104 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ---------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 56"
##
## ----------------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.3199 1.525e-06 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 -0.2378 1.978e-06 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 -0.1347 3.878e-06 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 -0.6289 5.781e-06 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 -0.3198 8.173e-06 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 -0.3812 8.605e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 -0.3287 8.605e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 -0.5609 9.529e-06 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957 -0.3773 9.887e-06 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 -0.4904 1.692e-05 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ----------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
##
## -----------------------------------------------------------------------------------------------------------
## Row.names c1 c4 adj.P.Val V2
## -------------------------------------------------- --------- -------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.4205 1.012e-06 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 -0.243 3.42e-06 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 -0.6439 3.42e-06 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 -0.5047 3.42e-06 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 -0.2602 3.42e-06 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 -0.5014 3.42e-06 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 -0.3133 3.42e-06 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 -0.4064 3.42e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 -0.4112 4.748e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 2.011e-05 -0.09226 5.47e-06 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## -----------------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 54 56 55
##
## **c2** 0 0 0 0
##
## **c3** 0 0 0 0
##
## **c4** 0 0 0 0
## ----------------------------
Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.
| C1 | C2 | C3 | C4 | |
|---|---|---|---|---|
| C1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | |
| C2 | Nothing significant | Nothing significant | ||
| C3 | Nothing significant | |||
| C4 |
Out of all regulatory datasets, we select only histone marks
## [1] 721 39
## [1] 610 39
Next, we visualize heatmap of regulatory similarity.
## [1] 0.27127
The top 10 pairs of autoimmune-associated SNPs are most similar with each other.
##
## ---------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## --------------------------------- --------------------------------- -------------------
## HDL_cholesterol Triglycerides 0.621
##
## Rheumatoid_arthritis Ulcerative_colitis 0.4856
##
## HDL_cholesterol LDL_cholesterol 0.48
##
## HDL_cholesterol Platelet_counts 0.4609
##
## Platelet_counts Triglycerides 0.4504
##
## LDL_cholesterol Triglycerides 0.4151
##
## Creatinine_levels Renal_function_related_traits_BUN 0.3915
##
## Psoriasis Systemic_lupus_erythematosus 0.3911
##
## Renal_function_related_traits_BUN Urate_levels 0.3689
##
## Alopecia_areata C_reactive_protein 0.3686
## ---------------------------------------------------------------------------------------
The similarity dendrogram can be divided into separate groups:
## Cluster01 has 6 members
## Celiac_disease
## Multiple_sclerosis
## Kawasaki_disease
## Primary_biliary_cirrhosis
## Systemic_lupus_erythematosus
## Psoriasis
##
## Cluster02 has 14 members
## Type_2_diabetes
## Fasting_glucose_related_traits
## Red_blood_cell_traits
## Crohns_disease
## Migraine
## Systemic_sclerosis
## Ankylosing_spondylitis
## Platelet_counts
## Triglycerides
## HDL_cholesterol
## Vitiligo
## Progressive_supranuclear_palsy
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
##
## Cluster03 has 11 members
## Allergy
## Type_1_diabetes
## Primary_sclerosing_cholangitis
## Juvenile_idiopathic_arthritis
## Behcets_disease
## Ulcerative_colitis
## Rheumatoid_arthritis
## Autoimmune_thyroiditis
## Alopecia_areata
## C_reactive_protein
## Asthma
##
## Cluster04 has 8 members
## Bone_mineral_density
## Chronic_kidney_disease
## Alzheimers_combined
## Restless_legs_syndrome
## Atopic_dermatitis
## Urate_levels
## Renal_function_related_traits_BUN
## Creatinine_levels
##
The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 44"
##
## ------------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## ----------------------------------------------- --------- -------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.5773 5.236e-08 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 -0.205 8.601e-07 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 -0.06392 8.601e-07 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.3131 1.346e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3862 7.309e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 -0.7181 9.149e-06 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.6869 1.924e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k04me3StdPkV2 4.708e-08 -0.167 2.222e-05 GM12878 H3K4me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.9988 3.557e-05 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k04me1StdPkV2 6.263e-15 -0.01015 3.557e-05 GM12878 H3K4me1 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## ------------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 54"
##
## -----------------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## ----------------------------------------------- --------- ------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 0.8142 1.464e-06 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 0.2791 2.209e-05 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 0.6608 3.836e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 0.5631 5.197e-05 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.4995 7.28e-05 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 0.7926 7.28e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 0.9054 7.28e-05 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.9118 7.28e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneDnd41H3k04me1Pk 9.105e-08 -0.7878 0.0001171 Dnd41 H3K4me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm06990H3k4me3StdHotspotsRep1 0.0001636 -0.9922 0.0005313 GM06990 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
## -----------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
##
## -------------------------------------------------------------------------------------------------------------
## Row.names c1 c4 adj.P.Val V2
## ----------------------------------------------- --------- --------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.3919 2.056e-07 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 -0.02811 4.143e-06 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 -0.1383 4.143e-06 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.005656 5.047e-06 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.2947 8.699e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k04me3StdPkV2 4.708e-08 -0.01621 2.43e-05 GM12878 H3K4me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 -0.5261 2.535e-05 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3212 2.572e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.3194 2.572e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneDnd41H3k04me1Pk 9.105e-08 -0.06965 3.248e-05 Dnd41 H3K4me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
## -------------------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c3 , number of degs significant at adj.p.val<0.5: 18"
##
## -------------------------------------------------------------------------------------------------------
## Row.names c2 c3 adj.P.Val V2
## ------------------------------------------ -------- --------- ----------- -----------------------------
## wgEncodeBroadHistoneA549H3k79me2Dex100nmPk 0.008678 -0.02115 0.01579 A549 DEX 100 nM H3K79me2
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmH3k27me3StdPk -0.02327 3.654e-07 0.02165 HSMM H3K27me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhaH3k27me3StdPk -0.01506 0.0001926 0.02689 NH-A H3K27me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneA549H3k36me3Dex100nmPk 0.1047 -0.004845 0.02689 A549 DEX 100 nM H3K36me3
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneNhlfH3k79me2Pk 0.005332 -0.05179 0.03198 NHLF H3K79me2 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneK562H3k36me3StdPk 0.001587 -0.009793 0.03505 K562 H3K36me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmtH3k09me3Pk -0.02596 0.003835 0.04333 HSMMtube H3K9me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneA549H3k27me3Etoh02Pk -0.01807 0.002645 0.04472 A549 EtOH 0.02% H3K27me3
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmtH3k27me3Pk -0.01922 0.001151 0.04472 HSMMtube H3K27me3 Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhdfadH4k20me1Pk 0.001681 -0.02737 0.04472 NHDF-Ad H4K20me1 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## -------------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 5"
##
## -----------------------------------------------------------------------------------------------------
## Row.names c2 c4 adj.P.Val V2
## ---------------------------------------- ------- ---------- ----------- -----------------------------
## wgEncodeBroadHistoneNhdfadH3k36me3StdPk 0.1009 -0.005215 0.02027 NHDF-Ad H3K36me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhekH3k9me1StdPk 0.2002 -0.001186 0.02253 NHEK H3K9me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneHmecH3k36me3StdPk 0.01773 -0.0002847 0.04273 HMEC H3K36me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k36me3StdPk 0.2219 -0.001063 0.05838 GM12878 H3K36me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneK562NcorPk 0.04478 -0.005174 0.0846 K562 NCoR Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
## -----------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 44 54 55
##
## **c2** 0 0 18 5
##
## **c3** 0 0 0 0
##
## **c4** 0 0 0 0
## ----------------------------
Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.
| C1 | C2 | C3 | C4 | |
|---|---|---|---|---|
| C1 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | |
| C2 | Cell types: K562, NHEK, NHDF-Ad, NH-A, HMEC Reg: H3K36me3, H4K20me1, H3K79me2 | Nothing significant | ||
| C3 | Nothing significant | |||
| C4 |